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We study an optimal process control problem with multiple assignable causes. The process is initially in- 
control but is subject to random transition to one of multiple out-of-control states due to assignable causes. 
The objective is to find an optimal stopping rule under partial observation that maximizes the total expected 
reward in infinite horizon. The problem is formulated as a partially observable Markov decision process 
(POMDP) with the belief space consisting of state probability vectors. New observations are obtained at 
fixed sampling interval to update the belief vector using Bayes' theorem. Under standard assumptions, we 
show that a conditional control limit policy is optimal and that there exists a convex, non-increasing control 
limit that partitions the belief space into two individually connected control regions: a stopping region and 
a continuation region. We further derive the analytical bounds for the control limit. An algorithm is devised 
based on structural results, which considerably reduces the computation. We also shed hght on the selection 
of optimal fixed sampling interval. 
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1. Introduction 



Statistical process control (SPC), which has been widely used in manufacturing systems to control 
quality, is receiving interests in a wide range of applications such as public health and medicine 
( Woodalll|2006 ), security systems ( Ye et al^|2003 ) , condition-based maintenance ( Kim et ar]|2011 ), 
environment management (Corbett and Pan 2002) and finance ( Frisen|[2008 ) . 

An in-control process is often subject to the competing influences of multiple assignable causes, 
which may result the process in different out-of-control states. For example, a manufacturing 
process can go out of control due to various causes such as machine faults, material variations and 
human error. Depending on the cause, the out-of-control cost per unit time as well as the cost of 
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restoring the system back to control may vary. 

The control chart reflects key trade-ofl^ between penalties of delayed detection and false alarm. 
The presence of multiple potential out-of-control states introduces an additional level of complexity. 
Since every out-of-control state differs in its impact, one is expected to be more wary about more 
costly out-of-control states. It is a great challenge to balance the trade-off with co-existence of two 
sources of uncertainty: the out-of-control time and type of the out-of-control. 

Process control with multiple assignable causes has been extensively studied in non-Bayesian 



framework such as variable charts ( Knappenberger and Grandage 1969, Duncan 1971 Tagaras 



and Lee 


1988 


) and attribute charts ( 


Montgomery et al. 


1975 


Chiu 


1976 


Gibra 


1981 


Williams 



et al.|[T985 ). However, the non-Bayesian approach is known to be sub-optimal (Taylor 1965, 1967 



Vaughan 1993, Calabrese 1995). 



An alternative approach is Bayesian control chart, originally proposed by [Girshik and Rubin 



(1952), which uses the posterior probability of the state of the process as the sufficient statistic of 
complete historical information. Bayesian process control problem can be formulated as a partially 



observable Markov decision process (POMDP) (Eckles 1968, Ross 1971, White 1977 Calabrese 



1995[ [Tagaras and Nikolaidis | |2002[ |Makis||2008l |2009D . Details of POMDP can be found in |Sondik 



( 1971 ) and Smallwood and Sondik ( 1973 ) , and an excellent review can be found in Monahan ( 1982 ) . 

Most of Bayesian process control problems assume a single assignable cause and little is known 
about the optimal policy in the presence of multiple out-of-control states. While the importance of 



Bayesian process control with multiple assignable causes has been highlighted in Tagaras and Niko- 



laidis ( 2002 ) , the extension from a single to multiple assignable causes is very challenging ( Tagaras 



and Nikolaidisl|2002D . The belief space of POMDP for the process control with N assignable causes 



consist of (A'^ + 1) dimensional vectors. It is widely agreed that structural results for high dimen- 



(Pollock 


1970 


Monahan 


1982 



The literature on the Bayesian process control with multiple assignable cause is scarce and most 



works rely on numerical methods. Tagaras and Nenes (2007) tackle the problem with two assignable 



causes by investigating the economic design of two-sided Bayesian X charts in finite horizon. They 



Wang and Lee: Bayesian process control with multiple assignable causes 
Article submitted to Operations Research-^ manuscript no. OPR.E-2012-12-641 



3 



discretize the belief space and numerically compute the optimal policy without any structural 



results. Nenes and Tagaras (2007) study the same problem and show structural properties of the 
optimal policy. However, the properties they showed are restricted to a single period problem. 

Therefore, the objective of this research is to study the Bayesian process control with multi- 
ple assignable causes. Specifically, we will formulate the process control problem as a partially 
observable Markov decision process, show structural properties of the optimal policy, develop a 
computationally efficient algorithm, and shed light on the optimal sampling interval. The main 
contribution is that for the first time we show structural properties for the Bayesian process control 
with multiple assignable causes in infinite horizon. 

The rest of the paper is organized as follows. Section 2 introduces the model, followed by struc- 
tural properties of the optimal policy in Section 3. In Section 4, a computationally efficient algo- 
rithm is developed using the structural properties. In Section 5, we investigate optimal sampling 
interval. In Section 6, we show the efficiency of the proposed algorithm and sensitivity analyses. 
Section 7 concludes the research and suggests future studies. 

2. The Model 

We formulate the problem as an infinite-horizon POMDP model, where the objective is to maximize 
the total expected undiscounted reward. 

2.1. Process dynamics 

We model the state of the process as a continuous-time Markov chain {Xt,t ^ 0}, with state space 
S = {0, 1, . . . , N}. The state is in-control state while the others are A'^ distinct out-of-control states. 
The process is in the in-control state at the beginning and subject to random transition into an 
out-of-control state due to A'' assignable causes {i?i,i?2, • • • ,-Rjv}- An assignable cause competes 
against the other causes to bring the system out of control independently, and the time to be 



taken is assumed to be exponential with rate A^. As discussed by Tsiatis (1975), the independence 
assumption is necessary to ensure the identifiability of the rates = 1,2, ... ,N from historical 
data. The exponential assumption is rather standard in the literature for tractability and, as argued 
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by many, e.g. Lorenzen and Vance (1986), can be justified in case that a complex system consists 
of multiple components experiencing failure independently of each other. 

Another assumption is that the N out-of-control states are all absorbing. That is, once the process 
becomes out of control, it remains in the same state until an action is taken. This assumption is 



also widely accepted in the relevant literature, such as Knappenberger and Grandage (1969), Chiu 



(1976 


) and 


Saniga 


(1977 



out-of-control state before the transition between out-of-control states. 

There are two processes that are stochastically related: one is the unobservable state of the 
process {Xt,t ^ 0} and the other is an observable output of the process, denoted by {Yt,t ^ 0} from 
which we take samples at every h time units. It is assumed that Ynh, n = 0, 1, 2, . . . are independent 
with density f,{y) = f{Ynh = y\Xt = i),i = 0, l,...,N. 

Let be the standard A^^-simplex of probability vectors (also known as the belief space) shown 
as follows: 

5^ A |n = VTi, . . . , VTjv) G [0, 1]^+M VTo + TTi + . . . + 7r^. = 1} 



where tt^ is the posterior probability that the process is in state «. It is well known that probability 
vector n = (7ro,7ri, . . . jTTn) E is the sufficient statistic of complete historical 



information about 



the state of the process (Astrom 1965 Aoki 1965, Bertsekas 1976). 

Let P be the state transition probability matrix for the state Xt of the process at sampling 
points, where 



(l-e-^'')Ai/A 







(l-e-^'')A^/A 








(1) 



If n is the posterior probability at a sampling point, the prior at the next sampling interval is 
given by HP and hence, by Bayes' theorem, given an observation y, the posterior probability vector 

n/.(y,n) is 



UFGjy) 



(2) 
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where G{y) = diag{fQ{y), fi{y), . . . ,/Ar(y)) is a diagonal matrix containing the observation den- 
sities on the diagonal and F{y) = [/o(y), /i(y), • • • , fN{y)]' is the vector of conditional densities. 

2.2. The Optimality Equations 

Upon sampling with a cost 0, a decision on whether to stop (a = 1) or to continue (a = 0) the 
process is to be made. If the decision is to continue, no action will be taken until the next sampling 
epoch, which will incur an out-of-control cost Ci per unit time, where i is the unobservable state of 
the process. Otherwise, the process will be terminated immediately, incurring a termination cost 
Tj ^ 0, where i is again the unobservable state (Tq can be interpreted as a false alarm penalty). In 
addition, reward will be accrued at a content rate r > as long as the process continues. Notice that 
the reward should be less than the out-of-control costs in case the process is out-of-control. That 
is, Ci> r for alH = 1, 2, . . . , A'^. Without loss of generality, we assume that Cg = and that the states 
are ordered so that < Ci < C2 ■ • ■ < Cn . We also denote c = [0, Ci, . . . , Cn]' and T = [Tq, Ti, . . . , T^]' 
for convenience. 

The objective is to find an F-stopping time r* maximizing the total expected reward given by 

E[R{t)\Xo = 0], 

where 

N ,.Th N 

R{T)=rhT-dT-'^c, / I{Xt=t}dt - TJ{x^^=i} 

Notice that the linear reward term rhr in R{t) is closely related to the long-run average cost 
through so called "A-maximization" technique ( Aven and Bergman|1986 ) and hence our results can 



be extended into the long-run average cost criteria, which was investigated by Makis ( 2008 ) for a 



problem with a single assignable cause. Therefore, our model can be considered as a generalization 



of some results in Makis (2008). 



We consider an m-stage value function {Kn(n)} given by the following optimality equations: 
K„+i(n) = max{-nT,r/i-nQc/i-d + j Kn(n^(2/,n))nPF(y)dy}, 
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Vo{Il) = -UT, 



(3) 



where 



1-7 A17/A ... Ajv7/A 
1 ... 



(4) 



^0 ... 1 

the term 7 = 1 — (1 — e^^'^)/\h is the expected fraction of time spent in any out-of-control state 
within a samphng interval /i, given the process was in-control at the beginning of the interval. The 
first term on the right-hand side of the optimality equation is the expected reward of immediate 
stopping, the second term is the expected reward of continuing the process to the next sampling 
epoch. 

3. Structure of the Optimal Policy 

In this section, we present structural properties of the value function and the optimal policy of the 
process control problem in finite horizon. We then extend the problem to the infinite horizon and 
present the main results. We begin with a standard result on the convexity of value function. 

Lemma 1. ^^(11) is a convex function for all 

Lemma 1 leads to the following theorem which eliminates trivial cases and provides analytical 
bounds on the value functions. 

Theorem 1. Let i?o = (7c' A/i - r/i + d)/(l - e"^'') + T'A and \J = {Ro,Ti, . . . ,TNy , where \ = 
[0,Ai/A,...,AAr/A]'. 

1. If Rq > To, then Vrn{H) = — IIT for all m> 0, it is not optimal to initiate the process; 

2. If Rq ^ Tq, the value function Kn(n) is bounded by hyperplanes: 



-nT^T4.(n)^-nu 



(5) 



Remark 1. Calabrese ( 1995 ) has derived an upper bound for the value functions in case of a single 



assignable cause by evaluating the value function for the action a = 0. This technique was later 
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adopted by Tagaras and Nikolaidis (2002) to prove a similar result. However, we found that this 
technique can be used only if Ci = C2 = • • • = Cjv. Because when c^'s are asymmetric, the iterations 
of continuation value function no longer converge. 



Remark 2. Our bounds are tighter than those in Calabrese ( 1995 ), Tagaras and Nikolaidis (2002 ), 



and Makis (2008). In fact, —Rq can be interpreted as the maximum expected reward when the state 
of the process is perfectly observable, which is an upper bound for the reward of partially observable 
process. Hence, under the first condition {—Rq < —Tq) of Theorem [l| it is more economical to pay 
the false alarm penalty —To than to continue for even a single period for a reward less than —Rq. 

Remark 3. Notice that Rq increases as the sampling interval h increases and/or sampling cost 
d increases. That is, as the sampling interval and/or cost becomes sufficiently large, it is always 
optimal to stop the process right at the beginning rather than to end up paying the out-of-control 
cost. More details will be discussed in Section [5l 

Because Kn(n) is the value function for the m-stage stopping problem, we have K„+i(n) ^ Kn(n) 
for all m ^ 0. As V^„(n) is bounded from above, lim V^(n) exists and satisfies the following 
optimality equation. 

V(Jl) = max^~nT, rh-Uqch-d + J V {Uhiy,U))UFF{y)dy^ (6) 
It is straightforward to show that the convexity of V{Il) is preserved. 

The following lemma will lead to our main result that the optimal policy divides the probability 
simplex 5^ into no more than two individually connected regions. 

N 

Lemma 2. It is optimal to stop when ^^vr^ = 1. 

i=l 

Lemma [2] states that it is always optimal to stop the process when the process is in any out-of- 
control state with a probability 1. 

For (tto, . . . ,vrAr) G , let be a (A'' — l)-dimensional vector of probabilities defined by 

n(_i) = {tti,tt2, ■ ■ . ,7ri_i,7ri+i, . . . ,7rAr),Vi = 1, . . . , A. 
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Figure 1 A projected view of the stopping region T (shaded) and continuation region S^\r (unshaded) for 
parameters r = 5,d = 0,ro = 50, Ti = 60, r2 = 100, ci = 10, C2 = 20, Ai = 0.01, A2 = 0.02, /i = l,,fo(y) = 
iV(0, 2), /i (y) = Ni-V2, 2), /2(y) = iV(2^, 2) 

Theorem 2. (Conditional control limit policy) Conditional on any given vector , there 
exists a control limit i?i(n(_i)) such that it is optimal to stop (a* = 1) if Hi ^ (!!(_,;)); and continue 
(a* =0), otherwise. Moreover, the control limit function i?j(n(_i)) is convex and non-increasing in 
each component o/n(_j). 

The optimal policy divides the probability simplex into no more than two individually connected 
regions: a convex stopping region T and a continuation region \ T. The examples are shown 
in Figure [T] (for N = 2) and Figure [2] (for N = 3). The control limit function Bi{Il(^i)) stands 
as a "shield" against multiple out-of-control states in the probability simplex . This structure 
allows a simple representation of the control regions, which is especially useful in process control 
with a large number of assignable causes. Recall that under the first condition of Theorem [l] the 
continuation region can be empty, i.e., F = 5^. 

In the following proposition, we present closed-form sufficient conditions for the optimality of 
the two control actions - to stop (a* = 1) and to continue (a* = 0), which can be interpreted as the 
analytical bounds for the control limits. 
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Figure 2 A projected view of the stopping region F (shaded) and continuation region S^\T (unshaded) for param- 
eters r = 5, d = 0, To = 50, Ti = 60, T2 = 70, T3 = 80, a = 10, C2 = 15, C3 = 20, Ai = 0.01, A2 = 0.02, A3 = 
0.03, h = l,My) = N{0, 2), /i (y) = iV(-^/2, 2), My) = iV(| ^2, 2), /3(y) = 7V(3V2, 2) 



Proposition 1. 

1. It is optimal to stop the process if n(Qc/i + PU — T) + d > rh; 

2. It is optimal to continue the process if n(Qc/i + FT — T) + < rh. 

Recall again that it is optimal (by Theorem [T]) to stop the process at time if i?o > , in which 
case there should be no 11 G 5"^ satisfying the condition in Part 2 of Proposition [1} Otherwise, the 
inequality in Part 2 of Proposition [l] suggests that: if the process continues until the next sampling 
point, the reward rh will exceed the sum of the upper bound of the out-of-control cost HQch, the 
incremental termination cost 11 (PT — T) due to continuation till the next sampling interval, and 
the sampling cost d. 

4. Computation of the Optimal Policy 

The control charts for continuous measurements are computed by discretizing the belief space, 
because standard POMDP algorithms such as the Sondik algorithm ( Sondik||l971 ) and Monahan's 
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algorithm (Monahan 1982) require finite measurement spaces, whereby are not directly applica- 
ble to continuous observations. We utilize the structural properties to develop a computationally 
efficient value iteration algorithm, which can be seen as a multi-dimensional generalization of the 



algorithm by Calabrese (1995). 

Given a stopping threshold e > 0, the algorithm can be described as follows: 
The accelerated value iteration algorithm based on discretization 
Step 0: Set Vo(n) = -UT and m = 1. 

Step 1: Compute ym(n),Vn G S'^ \ using the optimality equations ([s]). 
Step 2: Compute Kn(n) = -HT, VH E T^. 

Step 3: If maxng5]v{|y,„_i(n) — Kri(n)|} ^ e, then set m = m-|- 1 and go to step 1; otherwise, set 

y(n) = K.(n),vnG5^. 

An example with two assignable causes (A'^ = 2) can best illustrate Step 1 of the algorithm. 
Suppose that we discretize the state space with a step size A. In Step 1, we start by fixing vti = 
7r2 = and compute Vm(vri, 7r2) as we gradually increase tti by the step size from to tti = A, 2A, . . . 
until the control limit Bi{tt2 = 0) is reached. Then we increase by the step size to = A and 
increase tti again from 0, find the values of Vm{TTi,'^2 = A) until the control limit Bi{'K2 = A) is 
reached. Then, 1x2 is increased again to 112 = 2A, and so on. The procedure continues until we reach 
7r2 ^ 7r2, where -Bi(vr2) = 0. 

The structural properties in Theorem [2] allow us to compute the value function V^„(n) as a simple 
multiplication of two vectors —11 and T for all II's in the stopping region F^, as shown in Step 2 
of the algorithm. This is where the above algorithm gains its computational efficiency. Intuitively 
the efficiency gain becomes significant when the stopping region is large, which will be discussed 
in detail in Section fejj 



When the measurements take discrete values such as the case of attribute charts (Calabrese 



1995), standard POMDP algorithms can be applied (Sondik 1971). In these algorithms, the value 



iteration is conducted by iterating so called a- vectors. However, the number of a-vectors can grow 
exponentially with the time horizon, the computation may be intensive for our infinite horizon 
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problem. In this case, the following Lemma becomes useful in finding the truncation horizon for 
approximation. 

Lemma 3. Let V^ili) and VJ^(n) denote the m- stage value functions with initial values Vo(n) = 
— HT and Vo(n) = — HU, respectively. Let and LJ^ denote the corresponding stopping regions, 
then the following statements hold for all H G and all m 

1. yi+i(n)^Kf^(n), rf„+icri. 

Lemma [3] states that the boundaries of stopping regions corresponding to value iterations with 
different initial values (one starting from upper bound, the other starting from lower bound) con- 
verge to the optimal control boundary from opposite directions. The gap between the value func- 
tions can be useful when we try to balance the quality of solution with the amount of computation. 
However, since the current trend in industry is moving away from attribute charts toward con- 



tinuous measurements (Woodall and Montgomery 1999), the accelerated discretization algorithm 



proposed earlier would have broader applications. 
5. Optimal Sampling Interval 



Although some authors (Taylor 1965, 1967, Tagaras and Nikolaidis 2002) advocates the adaptive 



sampling scheme based on posterior state probabilities, the current practice is still dominated by 
fixed sampling schemes. Therefore, we investigate the optimal fixed sampling interval, which is 



in line with Knappenberger and Grandage (1969), Carter (1972), among others. In the following 



proposition, we present bounds of the optimal sampling interval, which can reduce the range of 
search. 



Proposition 2. The optimal sampling interval h* should satisfy the following inequality 

7c'A/i* -rh* + d 



1-e 



-A/i* 



(7) 



Proposition [2] immediately follows Theorem 1. Figure [3] illustrates ranges of sampling intervals in 
which the condition in Proposition [2] is satisfied. In Figure 3a, the sampling cost d = and the 
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(a) d=0 
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Figure 3 The maximun expected reward (with its upper bound — -Ro and lower bound —To) as a function of 
sampHng interval /i (TV = 2, r = 0.5, ci = 1, C2 = 2, To = 5, Ti = 6, T2 = 10, Ai = 0.01, A2 = 0.02) 

condition ([T]) is satisfied for h G (0,20.2). Notice tliat the optimal sampling interval is arbitrarily 



small as evident by the monotonicity of the maximun expected reward curve. In Figure 3b the 
sampling cost d = 1 and the range of h satisfying the condition is (3.1,16.5). Notice that the 
maximum reward curve has a plateau outside of the identified interval. When the sampling interval 
is sufficiently large, it is optimal not to initiate the process (or stop at time 0) rather than to 
pay high out-of-control costs until the next sampling interval. When the sampling interval is too 
small, it is also optimal not to initiate the process to avoid the high cost due to frequent samplings. 



Notice in Figure 3a that the maximum reward is monotonically decreasing in h since more frequent 
sampling provides more information at no additional cost. Figure |4] illustrates how the maximum 
reward, as a function of the sampling interval, is changing with the variation of sampling cost 
d = 0, 0.1, 0.2, . . . , 1, where we found that the optimal sampling interval h* is increasing in d. 

6. Numerical Studies 

In this section we present numerical studies on the computational efficiency of the proposed 
algorithm, as well as the sensitivity of the optimal reward to various parameters that might be 
misspecified. To this end, we introduce new parameters indicating the sizes of the shifts {6i for 
out-of-control state i) from the in-control state to multiple out-of-control states. Specifically, we 
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1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 

Sampling interval h 

Figure 4 The maximun expected reward as a function of sampling interval h for different sampling cost d (the 
rest of parameters are the same as those in Figure jsj). 

consider the normal distribution as the density of in-control and out-of-control states. The normal 
density for the in-control-state is N[^, a"^) and that of out-of-control state i is fi{y) = N{fi + 5ia, a^) 
foi i = l,2,...,N. 

6.1. Efficiency of the Proposed Algorihtm 

Acceleration factors of the proposed algorithm in comparison with the standard value iteration are 
presented in Table [T| The acceleration factor is the ratio of the computation time of the standard 
algorithm to that of the proposed. As the efficiency comes from the elimination of the computation 
in the stopping region, the acceleration tends to be more significant when the stopping region 
becomes large. Notice that large out-of-control costs and small shift size contribute to the increase 
in the size of the stopping region, which is confirmed by the increasing acceleration factors in Table 
[T] Also interesting is that when the shifts are in opposition directions {6i and 82 have different 
signs; 5i = —1 and 62 = 1.5 as in the second small table) it is more difficult to detect the transition 
to an out-of-control state and hence the stopping region becomes larger. 
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Table 1 Acceleration of computation by using the structural property (A^ = 3; the rest of parameters are the 

same as those in Figure [21 . 
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6.2. Sensitivity Analysis 

The assumption that all out-of-control states are absorbing may not be realistic as the process 
may continue to deteriorate after a transition into an out-of-control state. Therefore, we perform 
sensitivity analysis on the inter-transition rates. To this end, we first find optimal Bayesian control 
charts for a process with A12 = {0.0,0.01,0.02,0.04,0.08,0.16} and compute the exact optimal 
rewards for the charts. We then use the optimal control chart found for a process with A12 = to 
control processes with A12 G {0.01,0.02,0.04,0.08,0.16}. The expected total rewards are computed 
using simulation. In Table[2| simulated total rewards ('Appr.') are reported along with the optimal 
rewards ('Exact'). The error is insignificant in most cases except when A12 is large but A2 is much 
smaller than Ai, in which it is much more likely for the in-control process to move to state 2 via 
state 1 than to move directly to state 2. In this case, the process is said to be subject to sequence 
of causes rather than to competing causes. 
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Another observation, which is related to the sensitivity on direction of the shifts, is that the 
optimal reward is larger when the out-of-control states are defined by shifts along the same direction 
(i.e., 5i and 82 have the same sign). An intuitive explanation is that it is easier to detect the 
out-of-control states and hence the total rewards are larger. 

The sensitivity analysis on the misspecification of out-of-control costs are shown in Table [3j We 
first find the optimal control chart for the 'Assumed' out-of-control costs shown in the first column 
and then run simulation with the 'Actual' out-of-control costs shown in the second row. The 'Err' 
is the sub-optimality in percentage. Noticeably large errors are observed when the out-of-control 
costs are severely under-estimated. Also the error is smaller for larger shifts {5i = 1,^2 = 2) than 
smaller shifts {5i = 0.5, 82 = 1). 

Table [4] presents the sensitivity analysis of the total reward on the misspecification of the shift 
size. The error becomes significant when the shifts are over-estimated. Also notice that the errors 
are larger with larger out-of-control costs (ci = 15, C2 = 30) than smaller out-of-control costs (ci = 
10,C2 = 15). 

7. Conclusions 

In this paper, we investigate the Bayesian process control with multiple assignable causes, which 
has long been discussed in the literature but few structural property or analytical result has been 
known. We formulate the problem as a high-dimensional POMDP in infinite horizon and reveal 
structural properties of optimal control policies. Under the standard assumptions, we show that a 
conditional control limit policy is optimal for the maximization of the expected total reward. The 
numerical studies show that the absorbing out-of-control states assumption would not result in 
a significant error unless the transition rates among out-of-control states are unreasonably larger 
than the other transition rates. 



Tagaras and Nikolaidis (2002) state that "the optimal monitoring policy may turn out to be 
extremely complex and impractical." However, we find that under fixed sampling scheme the 
optimal policy splits the belief space into no more than two individually connected regions: one for 
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Table 2 Comparison of simulated total reward between the exact control chart (A12 > 0) and approximated 
chart (assuming A12 = 0) for different values of A12. (r = 5, Tq = 10, Ti = 20, T2 = 30, = 0) 







Si = - 


1, 52 = 2, ci 


= 10, C2 = 15 






A12 


Ai = 0.01, A2 


= 0.02 


Ai=0.01, A2 = 0.08 


Ai =0.08, A2 


= 0.01 






F.xart 


Appr. 


Exact 




Exact 


0.01 


98.35 


98.51 


6.67 


6.70 


10.14 


10.28 


0.02 


98.86 


99.88 


6.83 


7.08 


10.01 


10.14 


0.04 


98.54 


99.48 


6.62 


7.06 


9.31 


9.33 


0.08 


98.57 


99.54 


6.88 


6.94 


7.67 


8.53 


0.16 


98.12 


99.83 


6.73 


6.87 


5.49 


7.22 






5i = - 


1, 52 = 2, ci 


= 10, C2 = 30 






A12 


Ai = 0.01, A2 


= 0.02 


Ai=0.01, A2 = 0.08 


Ai =0.08, A2 


= 0.01 




Appr. 


Exact 


Appr. 


Exact 


Appr. 


Exact 


0.01 


75.27 


75.89 


-2.49 


-2.37 


7.35 


7.63 


0.02 


74.51 


74.85 


-2.50 


-2.48 


6.61 


6.94 


0.04 


72.95 


73.63 


-2.61 


-2.41 


5.11 


5.48 


0.08 


71.19 


72.71 


-2.67 


-2.61 


1.98 


4.23 


0.16 


68.47 


70.52 


-2.70 


-2.68 


-2.56 


2.46 



5i = l, 52 = 2, ci = 10, C2 = 15 

A12 Ai = 0.01, A2= 0.02 Ai = 0.01, A2 = 0.08 Ai = 0.08, A2 = 0.01 

Appr. Exact Appr. Exact Appr. Exact 

0.01 101.67 101.70 8.14 8.26 11.74 11.79 

0.02 102.49 103.13 8.51 8.56 11.86 11.95 

0.04 102.26 103.24 8.52 8.54 11.34 11.57 

0.08 102.92 103.23 8.35 8.38 11.06 11.11 

0.16 103.18 103.67 8.19 8.24 10.29 10.30 



5i = l, 52=2, ci = 10, C2 = 30 



A12 


Ai=0.01, A2 


= 0.02 


Ai =0.01, A2 


= 0.08 


Ai = 0.08, A2 


= 0.01 




Appr. 


Exact 


Appr. 


Exact 


Appr. 


Exact 


0.01 


79.65 


79.79 


-1.39 


-1.30 


9.78 


9.88 


0.02 


79.87 


80.52 


-1.29 


-1.20 


9.68 


9.74 


0.04 


78.79 


80.03 


-1.57 


-1.17 


8.80 


8.86 


0.08 


78.38 


78.79 


-1.43 


-1.41 


7.50 


7.82 


().i() 


7().8() 


77.20 


-i.Gl 


-i.")2 


5.. -JO 


(i.li 
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Table 3 Sensitivity to misspecification of ci and C2 (r = 5, Ai = 0.02, A2 = 0.01, To = 10, Ti = 20, T2 = 30, d = 0, 

h=\.) 



Si = 0.5, S2 = 1 





Actual 




Cl = 10, C2 = 10 


Cl = 10, C2 = 20 


Cl = 15, C2 = 20 


Cl = 20, C2 = 30 




Assumed 




Reward 


Err(%) 


Reward 


Err(%) 


Reward 


Err(%) 


Reward 


Err(%) 


Cl 


= 10, C2 = 


10 


71.30 


0.00 


57.15 


2.25 


20.78 


45.82 


-27.04 


223.97 


Cl 


= 10, C2 = 


20 


70.01 


1.81 


58.47 


0.00 


30.18 


21.32 


-7.21 


133.05 


Cl 


= 15, C2 = 


20 


61.16 


14.22 


53.91 


7.79 


38.36 


0.00 


16.03 


26.50 


Cl 


= 20, C2 = 


30 


45.69 


35.92 


41.81 


28.49 


33.35 


13.06 


21.81 


0.00 



Si = l, 52 = 2 





Actual 




Cl = 10, C2 = 10 


Cl = 10, C2 = 20 


Cl = 15, C2 = 20 


Cl = 20, C2 = 30 




Assumed 




Reward 


Err(%) 


Reward 


Err(%) 


Reward 


Err(%) 


Reward 


Err(%) 


Cl 


= 10, C2 = 


10 


104.77 


0.00 


96.68 


0.14 


72.65 


7.14 


38.33 


33.25 


Cl 


= 10, C2 = 


20 


104.56 


0.20 


96.82 


0.00 


75.76 


3.16 


44.33 


22.81 


Cl 


= 15, C2 = 


20 


100.58 


3.99 


95.67 


1.18 


78.24 


0.00 


53.64 


6.59 


Cl 


= 20, C2 = 


30 


91.79 


12.38 


88.61 


8.47 


75.96 


2.91 


57.43 


0.00 



Table 4 Sensitivity to misspecification of <5i and ^2 {r = 5, Ai = 0.02, A2 = 0.01, To = 10, Ti = 20, T2 = 30, d = 0, 

h = l.) 









Cl = 


10, C2 = 15 










Actual 


5i = 0.5 


,52 = 1 


5i = l 


, 52 = 1.5 


5i =1.5 


,52 = 2 


5i=2 


,52 = 3 


Assumed 


Reward 


Err(%) 


Reward 


Err(%) 


Reward 


Err(%) 


Reward 


Err(%) 


Si = 0.5, S2 = 1 


64.17 


0.00 


91.48 


5.32 


103.11 


10.42 


117.18 


6.97 


Si = 1,S2 = 1.5 


49.52 


22.82 


96.63 


0.00 


112.42 


2.33 


124.81 


0.92 


<5i = 1.5, S2 = 2 


23.23 


63.79 


95.53 


1.13 


115.11 


0.00 


125.13 


0.66 


61 = 2, S2 = 3 


4.41 


93.12 


93.08 


3.67 


112.96 


1.86 


125.97 


0.00 



Cl = 15, C2 = 30 



Actual 


5i =0.5 


,52 = 1 


5i = 1, 52 


= 1.5 


5i = 1.5 


,52 = 2 


5i =2 


,52 = 3 


Assumed 


Reward 


Err(%) 


Reward 


Err(%) 


Reward 


Err(%) 


Reward 


Err(%) 


5i = 0.5, 52 = 1 


31.11 


0.00 


55.16 


14.28 


68.70 


24.23 


88.16 


19.94 


5i = 1, 52 = 1.5 


10.19 


67.24 


64.35 


0.00 


86.38 


4.73 


106.84 


2.98 


5i = 1.5, 52 = 2 


-32.40 


204.14 


60.04 


6.69 


90.67 


0.00 


109.26 


0.79 


5i = 2, 52 = 3 


-68.64 


320.63 


53.96 


16.14 


88.00 


2.94 


110.13 


0.00 
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stopping and the other for continuation. This simple structure leads to a considerable reduction 
in computation. Furthermore, we derive analytical bounds on the control limits and the optimal 
sampling interval. 

There are multiple directions in which our work can be extended. First, the sampling interval 
can be dynamically adjusted based on posterior probabilities. Second, the assumption that all the 
out-of-control states are absorbing can be relaxed. Finally, the optimal process control problem 
can be studied over a finite horizon. 



Appendix. Proofs of Lemmas and Theorems 

A. Proof of Lemma [T] 

Proof For given 11, IIi G and ^ p ^ 1, define 

pU¥F{y) 



Po 



pUFF{y) + {l-p)U^FF{y) 
clearly, ^ po ^ 1 • According to equation ^ , we have 

n,(y, pU + {l- p)U,) = pon,(y, n) + (1 - /9o)n,(y, Hi) 

We prove the convexity by induction. For m = 0, Vo(n) = — IIT is convex. Assuming T^„(n) is 
convex, then 

p j K^(n^(y,n))nPF(y)dy + (l-p) j K.(n,(y,ni))niPF(y)(iy 

Vra{iih{y,mp^^ny)dy+ I K.(n^(y,ni))(i-p)niPF(y)dy 



^ / y„(y9on,(y,n) + (i-po)n;,(2/,ni))(pnPF(y) + (i-p)niPF(y))dy 

Vrn (n^y, pn + (1 - p)ni)) (pn + (l - p)Ii,)¥F{y)dy 



therefore the integral j V„,{lih{y,Il-))lifF{y)dy is convex in 11. According to equation 
y„j+i(n) is also convex in 11. 
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B. Proof of Theorem [T] 

Proof In order to prove this theorem, we first show that it is optimal to stop when vr^ = 1 for 
any i G {1, . . . , A?^}. Then we prove part 2 of the theorem, and finahy we prove part 1. 

Let denote the {N + l)-dimensional unit vector with 1 in the ith component. As vertices of 
the behef space 5^, (where i = 1, . . . , A'^) are the fixed point for Bayesian updating ([2]). That is, 
for i = l,. . . ,N , the following equations hold 



since r <Ci for i = l,. . . ,N, the iteration above will converge to the first term on the right 



because the first term corresponds to the action of stopping, it is optimal to stop the process when 
TTj = 1 for any i G {1, . . . , A''}. Based on this result, we will then prove the bounds for the value 
functions. 

The lower bound is obvious from the dynamic equation ([s]), we will focus on proving the upper 
bound. Let = (Fm(eo), ^^(ei), . . . , Kn(eAr))' denote the vector containing the Kra(n) values 
on all vertices of . Since the value function T^„(n) is convex, it is bounded from above by 




(8) 



lim Vrnie,) 



= -T 



(9) 



If the decision is to continue for all m > 0, the value iteration of Vm(eo) becomes 




(10) 



From the Bayes' theorem Q and the convexity of V^^(n), we have 




'm 



(11) 
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Combining equations (10) and (11), we construct the iteration of a dummy variable x^^, which is 



also the upper bound for Kr^(eo): 

rE„+i = e"^"x,„ + (r - ^c'\)h - d - (1 - e-^'')E'„,A 
Vmieo)^Xm (12) 



Because < e < 1 for < /i < oo, it is easy to show the iteration in ( 12 ) converges. Combining 



the results from equation (fDl), we have lim = rr T'A = —Rq, so V,n{eo) ^ — i?o- 

■— ' m— >-+oo 1 — e 

Following equation ^ and the convexity of value function, we obtain the upper bound Kra(n) ^ 
— n(i?o,7i, . . . ,Ti^)' , which completes the proof for part 2 of the theorem. 

Notice that if Rq > Tq, then — IIT ^ — IIU for all 11 G , in this case the value function is 
restricted to be Kn(n) = —IIT for all m ^ and all 11 € . In this particular case, it is not optimal 
to initiate the process even when it starts from the in-control state. Thereby we proved part 1 of 
the theorem. □ 

C. Proof of LemmsO 

Proof We rewrite the optimal equation (6) in a simpler form as V(Il) = max| — IIT, V^(n)|, 
where 14(11) denote the value function of continuation. For any 11 G S'^^^ = {(ttq, vti, . . . , vtjv) G 
5~|7ri + ... + ^jv = l}, 

N N N 

-nT = ^-r,7r,>5^K(e,)7r,^K( J]e,;vr,) =K(n) (13) 

i—1 i—1 i—1 

The first inequality follows from equation ([9]), the second inequality follows from the convexity of 
T4(n), as shown in Lemma [T] □ 

D. Proof of Theorem [2] 

Proof Let ^(7ri)|n(_j) denote the value function restricted on subset ^(^i), in which ^ TTi ^ 
1 — Because V{IV) is convex, so ^(7ri)|n(_j) is also convex in iTi. From Lemma j2| it is 

optimal to stop at vr^ = 1 — ^li(-i)- Furthermore, because V {Tri)\u^_^-^ is the maximum of a linear 
function and a convex function, it is easy to show that these functions have at most one intersection 
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in the interval ^ Tr^ ^ 1 — ^^^(-i)- The monotonicity of Bi{Yi(^_i)) follows from the convexity and 
Lemma O □ 

E. Proof of Proposition [1] 

Proof According to Theorem [T| V^(n) is bounded by hyperplanes 

rh-UQch-d- J Uh{y,'n)TUFF{y)dy ^ K(n) i^rh-UQch- d- J n,,(y,n)UnPF(y)dy 

rh - n(Qc/i + PT) - (i ^ K(n) ^ r/i - n(Qc/i + PU) - d (14) 

If the inequality in part 1 of proposition [T] holds, then K(n) ^ r/i — n(Qc/i + PU) — d < — IIT, in 
this case it is optimal to stop the process; If the inequality in part 2 of proposition [l] holds, then 
K(n) ^rh- n(Qc/i + PT) -d> -HT, it is optimal to continue. □ 
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